{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "Pandas是在NumPy基础上建立的新程序库，提供了一种高效的DataFrame数据结构。DataFrame本质上是一种带行标签和列标签、支持相同类型数据和缺失值的多维数组。Pandas不仅为带各种标签的数据提供了便利的存储界面，还实现了许多强大的操作，这些操作对数据库框架和电子表格程序的用户来说非常熟悉。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Pandas对象简介  \n",
    "如果从底层视角观察Pandas对象，可以把它们看成增强版的NumPy结构化数组，行列都不再只是简单的整数索引，还可以带上标签。在本章后面的内容中我们将会发现，虽然Pandas在基本数据结构上实现了许多便利的工具、方法和功能，但是后面将要介绍的每一个工具、方法和功能几乎都需要我们理解基本数据结构的内部细节。  \n",
    "因此，在深入学习Pandas之前，先来看看Pandas的三个基本数据结构：Series、DataFrame和Index。  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "import pandas as pd\n",
    "data = pd.Series([0.25,0.75,1.0])\n",
    "data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "0    0.25\n",
       "1    0.75\n",
       "2    1.00\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 1
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "从上面的结果中，你会发现Series对象将一组数据和一组索引绑定在一起，我们可以通过values属性和index属性获取数据。values属性返回的结果与NumPy数组类似"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "data.values"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "array([0.25, 0.75, 1.  ])"
      ]
     },
     "metadata": {},
     "execution_count": 2
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "data.index  #index属性返回的结果是一个类型为pd.Index的类数组对象"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=3, step=1)"
      ]
     },
     "metadata": {},
     "execution_count": 3
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "data[1]  #index属性返回的结果是一个类型为pd.Index的类数组对象\n",
    "data[1:3]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "1    0.75\n",
       "2    1.00\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "1. Serise是通用的NumPy数组  \n",
    "   到目前为止，我们可能觉得Series对象和一维NumPy数组基本可以等价交换，但两者间的本质差异其实是索引：NumPy数组通过隐式定义的整数索引获取数值，而Pandas的Series对象用一种显式定义的索引与数值关联。\n",
    "2. Series是特殊的字典  \n",
    "    你可以把Pandas的Series对象看成一种特殊的Python字典。字典是一种将任意键映射到一组任意值的数据结构，而Series对象其实是一种将类型键映射到一组类型值的数据结构。类型至关重要：就像NumPy数组背后特定类型的经过编译的代码使得它在某些操作上比普通的Python列表更加高效一样，Pandas Series的类型信息使得它在某些操作上比Python的字典更高效。\n",
    "3. 创建Series对象  \n",
    "   我们已经见过几种创建Pandas的Series对象的方法，都是像这样的形式：>>> pd.Series(data, index=index)其中，index是一个可选参数，data参数支持多种数据类型。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "data = pd.Series([0.25,0.5,0.75,1.0],\n",
    "                                        index=['a', 'b', 'c', 'd'])\n",
    "data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "a    0.25\n",
       "b    0.50\n",
       "c    0.75\n",
       "d    1.00\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "data['a']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "0.25"
      ]
     },
     "metadata": {},
     "execution_count": 6
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "population_dict = {'California': 38332521,                      \n",
    "      'Texas': 26448193,                          \n",
    "        'New York': 19651127,                         \n",
    "           'Florida': 19552860,                        \n",
    "               'Illinois': 12882135}\n",
    "population = pd.Series(population_dict)\n",
    "population"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California    38332521\n",
       "Texas         26448193\n",
       "New York      19651127\n",
       "Florida       19552860\n",
       "Illinois      12882135\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 7
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "用字典创建Series对象时，其索引默认按照顺序排列。典型的字典数值获取方式仍然有效：  \n",
    "和字典不同，Series对象还支持数组形式的操作，比如切片：\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "population[\"California\":\"Illinois\"]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California    38332521\n",
       "Texas         26448193\n",
       "New York      19651127\n",
       "Florida       19552860\n",
       "Illinois      12882135\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 8
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "source": [
    "pd.Series([2,4,6])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "0    2\n",
       "1    4\n",
       "2    6\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 9
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "source": [
    "pd.Series(5,index=[100,200,300])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "100    5\n",
       "200    5\n",
       "300    5\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 10
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "source": [
    "pd.Series({2:'a', 1:'b', 3:'c'})"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "2    a\n",
       "1    b\n",
       "3    c\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "execution_count": 11
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "source": [
    "pd.Series({2:'a', 1:'b', 3:'c'}, index=[3,2])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "3    c\n",
       "2    a\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "execution_count": 12
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Pandas的DataFrame对象  \n",
    "Pandas的另一个基础数据结构是DataFrame。和上一节介绍的Series对象一样，DataFrame\n",
    "Pandas数据处理｜91既可以作为一个通用型NumPy数组，也可以看作特殊的Python字典。  \n",
    "1. DataFrame是通用的NumPy数组  \n",
    "2. DataFrame是特殊的字典  \n",
    "3. 创建DataFrame对象  \n",
    "4. "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "source": [
    "area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,             \n",
    " 'Florida': 170312, 'Illinois': 149995} \n",
    "area = pd.Series(area_dict) \n",
    "area"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California    423967\n",
       "Texas         695662\n",
       "New York      141297\n",
       "Florida       170312\n",
       "Illinois      149995\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 13
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "source": [
    "states = pd.DataFrame({'population':population,\n",
    "                                                    'area':area})\n",
    "states"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "      <td>170312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            population    area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297\n",
       "Florida       19552860  170312\n",
       "Illinois      12882135  149995"
      ]
     },
     "metadata": {},
     "execution_count": 14
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "和Series对象一样，DataFrame也有一个index属性可以获取索引标签："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "source": [
    "states.index"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')"
      ]
     },
     "metadata": {},
     "execution_count": 15
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "另外，DataFrame还有一个columns属性，是存放列标签的Index对象"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "source": [
    "states.columns"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Index(['population', 'area'], dtype='object')"
      ]
     },
     "metadata": {},
     "execution_count": 16
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "source": [
    "states['area']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California    423967\n",
       "Texas         695662\n",
       "New York      141297\n",
       "Florida       170312\n",
       "Illinois      149995\n",
       "Name: area, dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 17
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "source": [
    "pd.DataFrame(population,columns=['population'])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            population\n",
       "California    38332521\n",
       "Texas         26448193\n",
       "New York      19651127\n",
       "Florida       19552860\n",
       "Illinois      12882135"
      ]
     },
     "metadata": {},
     "execution_count": 18
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "source": [
    "data = [{'a':i,'b':2*i}\n",
    "                for i in range(3)]\n",
    "pd.DataFrame(data)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b\n",
       "0  0  0\n",
       "1  1  2\n",
       "2  2  4"
      ]
     },
     "metadata": {},
     "execution_count": 19
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "source": [
    "pd.DataFrame([{'a':1,'b':2}, {'b':3, 'c':4}])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     a  b    c\n",
       "0  1.0  2  NaN\n",
       "1  NaN  3  4.0"
      ]
     },
     "metadata": {},
     "execution_count": 20
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "source": [
    "pd.DataFrame({'population':population,\n",
    "                                'area':area})"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>area</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>38332521</td>\n",
       "      <td>423967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>26448193</td>\n",
       "      <td>695662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>19651127</td>\n",
       "      <td>141297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>19552860</td>\n",
       "      <td>170312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>12882135</td>\n",
       "      <td>149995</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            population    area\n",
       "California    38332521  423967\n",
       "Texas         26448193  695662\n",
       "New York      19651127  141297\n",
       "Florida       19552860  170312\n",
       "Illinois      12882135  149995"
      ]
     },
     "metadata": {},
     "execution_count": 21
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "通过NumPy二维数组创建。假如有一个二维数组，就可以创建一个可以指定行列索引值的DataFrame。如果不指定行列索引值，那么行列默认都是整数索引值："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "source": [
    "import numpy as np\n",
    "pd.DataFrame(np.random.rand(3,2),\n",
    "                                columns=['foo', 'bar'],\n",
    "                                index=['a','b','c'])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>foo</th>\n",
       "      <th>bar</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0.275357</td>\n",
       "      <td>0.626552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>0.562277</td>\n",
       "      <td>0.575679</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>0.498922</td>\n",
       "      <td>0.434010</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        foo       bar\n",
       "a  0.275357  0.626552\n",
       "b  0.562277  0.575679\n",
       "c  0.498922  0.434010"
      ]
     },
     "metadata": {},
     "execution_count": 22
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "source": [
    "A = np.zeros(3,dtype=[('A','i8'), ('B','f8')])\n",
    "A"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])"
      ]
     },
     "metadata": {},
     "execution_count": 23
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "source": [
    "pd.DataFrame(A)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A    B\n",
       "0  0  0.0\n",
       "1  0  0.0\n",
       "2  0  0.0"
      ]
     },
     "metadata": {},
     "execution_count": 24
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Pandas的Index对象  \n",
    "我们已经发现，Series和DataFrame对象都使用便于引用和调整的显式索引。Pandas的Index对象是一个很有趣的数据结构，可以将它看作是一个不可变数组或有序集合（实际上是一个多集，因为Index对象可能会包含重复值）。这两种观点使得Index对象能呈现一些有趣的功能。    \n",
    "1. 将Index看作不可变数组    \n",
    "2. 将Index看作有序集合  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "source": [
    "ind = pd.Index([2,3,5,7,11])\n",
    "ind "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Int64Index([2, 3, 5, 7, 11], dtype='int64')"
      ]
     },
     "metadata": {},
     "execution_count": 25
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "source": [
    "print(ind[1])\n",
    "print(ind[::2])\n",
    "print(ind.size, ind.shape, ind.ndim,ind.dtype)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "3\n",
      "Int64Index([2, 5, 11], dtype='int64')\n",
      "5 (5,) 1 int64\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Index对象与NumPy数组之间的不同在于，Index对象的索引是不可变的，也就是说不能通过通常的方式进行调整："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "source": [
    "indA = pd.Index([1,3,5,7,9])\n",
    "indB = pd.Index([2,3,5,7,11])\n",
    "indA & indB"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "/tmp/ipykernel_22997/612271889.py:3: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.intersection(other) instead\n",
      "  indA & indB\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Int64Index([3, 5, 7], dtype='int64')"
      ]
     },
     "metadata": {},
     "execution_count": 28
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [],
   "outputs": [],
   "metadata": {}
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "0bedf3452ed48e38a2b75ff9f66ce8ae9ae2e92e8665cb618ee0797d2f46b9e5"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.8.10 64-bit ('d2l-zh': conda)"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}